15 research outputs found
A Survey on Deep Neural Network Pruning-Taxonomy, Comparison, Analysis, and Recommendations
Modern deep neural networks, particularly recent large language models, come
with massive model sizes that require significant computational and storage
resources. To enable the deployment of modern models on resource-constrained
environments and accelerate inference time, researchers have increasingly
explored pruning techniques as a popular research direction in neural network
compression. However, there is a dearth of up-to-date comprehensive review
papers on pruning. To address this issue, in this survey, we provide a
comprehensive review of existing research works on deep neural network pruning
in a taxonomy of 1) universal/specific speedup, 2) when to prune, 3) how to
prune, and 4) fusion of pruning and other compression techniques. We then
provide a thorough comparative analysis of seven pairs of contrast settings for
pruning (e.g., unstructured/structured) and explore emerging topics, including
post-training pruning, different levels of supervision for pruning, and broader
applications (e.g., adversarial robustness) to shed light on the commonalities
and differences of existing methods and lay the foundation for further method
development. To facilitate future research, we build a curated collection of
datasets, networks, and evaluations on different applications. Finally, we
provide some valuable recommendations on selecting pruning methods and prospect
promising research directions. We build a repository at
https://github.com/hrcheng1066/awesome-pruning
Influence Function Based Second-Order Channel Pruning-Evaluating True Loss Changes For Pruning Is Possible Without Retraining
A challenge of channel pruning is designing efficient and effective criteria
to select channels to prune. A widely used criterion is minimal performance
degeneration. To accurately evaluate the truth performance degeneration
requires retraining the survived weights to convergence, which is prohibitively
slow. Hence existing pruning methods use previous weights (without retraining)
to evaluate the performance degeneration. However, we observe the loss changes
differ significantly with and without retraining. It motivates us to develop a
technique to evaluate true loss changes without retraining, with which channels
to prune can be selected more reliably and confidently. We first derive a
closed-form estimator of the true loss change per pruning mask change, using
influence functions without retraining. Influence function which is from robust
statistics reveals the impacts of a training sample on the model's prediction
and is repurposed by us to assess impacts on true loss changes. We then show
how to assess the importance of all channels simultaneously and develop a novel
global channel pruning algorithm accordingly. We conduct extensive experiments
to verify the effectiveness of the proposed algorithm. To the best of our
knowledge, we are the first that shows evaluating true loss changes for pruning
without retraining is possible. This finding will open up opportunities for a
series of new paradigms to emerge that differ from existing pruning methods.
The code is available at https://github.com/hrcheng1066/IFSO.Comment: chrome-extension://ogjibjphoadhljaoicdnjnmgokohngcc/assets/icon-50207e67.pn
Optimistic Agent: Accurate Graph-Based Value Estimation for More Successful Visual Navigation
We humans can impeccably search for a target object, given its name only,
even in an unseen environment. We argue that this ability is largely due to
three main reasons: the incorporation of prior knowledge (or experience), the
adaptation of it to the new environment using the observed visual cues and most
importantly optimistically searching without giving up early. This is currently
missing in the state-of-the-art visual navigation methods based on
Reinforcement Learning (RL). In this paper, we propose to use externally
learned prior knowledge of the relative object locations and integrate it into
our model by constructing a neural graph. In order to efficiently incorporate
the graph without increasing the state-space complexity, we propose our
Graph-based Value Estimation (GVE) module. GVE provides a more accurate
baseline for estimating the Advantage function in actor-critic RL algorithm.
This results in reduced value estimation error and, consequently, convergence
to a more optimal policy. Through empirical studies, we show that our agent,
dubbed as the optimistic agent, has a more realistic estimate of the state
value during a navigation episode which leads to a higher success rate. Our
extensive ablation studies show the efficacy of our simple method which
achieves the state-of-the-art results measured by the conventional visual
navigation metrics, e.g. Success Rate (SR) and Success weighted by Path Length
(SPL), in AI2THOR environment.Comment: Accepted for publication at WACV 202
Factor Graph Neural Networks
In recent years, we have witnessed a surge of Graph Neural Networks (GNNs),
most of which can learn powerful representations in an end-to-end fashion with
great success in many real-world applications. They have resemblance to
Probabilistic Graphical Models (PGMs), but break free from some limitations of
PGMs. By aiming to provide expressive methods for representation learning
instead of computing marginals or most likely configurations, GNNs provide
flexibility in the choice of information flowing rules while maintaining good
performance. Despite their success and inspirations, they lack efficient ways
to represent and learn higher-order relations among variables/nodes. More
expressive higher-order GNNs which operate on k-tuples of nodes need increased
computational resources in order to process higher-order tensors. We propose
Factor Graph Neural Networks (FGNNs) to effectively capture higher-order
relations for inference and learning. To do so, we first derive an efficient
approximate Sum-Product loopy belief propagation inference algorithm for
discrete higher-order PGMs. We then neuralize the novel message passing scheme
into a Factor Graph Neural Network (FGNN) module by allowing richer
representations of the message update rules; this facilitates both efficient
inference and powerful end-to-end learning. We further show that with a
suitable choice of message aggregation operators, our FGNN is also able to
represent Max-Product belief propagation, providing a single family of
architecture that can represent both Max and Sum-Product loopy belief
propagation. Our extensive experimental evaluation on synthetic as well as real
datasets demonstrates the potential of the proposed model.Comment: Accepted by JML
Identifying Latent Causal Content for Multi-Source Domain Adaptation
Multi-source domain adaptation (MSDA) learns to predict the labels in target
domain data, under the setting that data from multiple source domains are
labelled and data from the target domain are unlabelled. Most methods for this
task focus on learning invariant representations across domains. However, their
success relies heavily on the assumption that the label distribution remains
consistent across domains, which may not hold in general real-world problems.
In this paper, we propose a new and more flexible assumption, termed
\textit{latent covariate shift}, where a latent content variable
and a latent style variable are introduced in the generative
process, with the marginal distribution of changing across
domains and the conditional distribution of the label given
remaining invariant across domains. We show that although (completely)
identifying the proposed latent causal model is challenging, the latent content
variable can be identified up to scaling by using its dependence with labels
from source domains, together with the identifiability conditions of nonlinear
ICA. This motivates us to propose a novel method for MSDA, which learns the
invariant label distribution conditional on the latent content variable,
instead of learning invariant representations. Empirical evaluation on
simulation and real data demonstrates the effectiveness of the proposed method
Semantic Role Labeling Guided Out-of-distribution Detection
Identifying unexpected domain-shifted instances in natural language
processing is crucial in real-world applications. Previous works identify the
OOD instance by leveraging a single global feature embedding to represent the
sentence, which cannot characterize subtle OOD patterns well. Another major
challenge current OOD methods face is learning effective low-dimensional
sentence representations to identify the hard OOD instances that are
semantically similar to the ID data. In this paper, we propose a new
unsupervised OOD detection method, namely Semantic Role Labeling Guided
Out-of-distribution Detection (SRLOOD), that separates, extracts, and learns
the semantic role labeling (SRL) guided fine-grained local feature
representations from different arguments of a sentence and the global feature
representations of the full sentence using a margin-based contrastive loss. A
novel self-supervised approach is also introduced to enhance such global-local
feature learning by predicting the SRL extracted role. The resulting model
achieves SOTA performance on four OOD benchmarks, indicating the effectiveness
of our approach. Codes will be available upon acceptance
A Computational Approach for Mapping Electrochemical Activity of Multi-principal Element Alloys
Multi principal element alloys (MPEAs) comprise an atypical class of metal alloys. MPEAs have been demonstrated to possess several exceptional properties, including, as most relevant to the present study a high corrosion resistance. In the context of MPEA design, the vast number of potential alloying elements and the staggering number of elemental combinations favours a computational alloy design approach. In order to computationally assess the prospective corrosion performance of MPEA, an approach was developed in this study. A density functional theory (DFT) – based Monte Carlo method was used for the development of MPEA ‘structure’; with the AlCrTiV alloy used as a model. High-throughput DFT calculations were performed to create training datasets for surface activity/selectivity towards different adsorbate species: O2-, Cl- and H+. Machine-learning (ML) with combined representation was then utilised to predict the adsorption and vacancy energies as descriptors for surface activity/selectivity. The capability of the combined computational methods of MC, DFT and ML, as a virtual electrochemical performance simulator for MPEAs was established and may be useful in exploring other MPEAs
Stock Market Prediction via Deep Learning Techniques: A Survey
The stock market prediction has been a traditional yet complex problem
researched within diverse research areas and application domains due to its
non-linear, highly volatile and complex nature. Existing surveys on stock
market prediction often focus on traditional machine learning methods instead
of deep learning methods. Deep learning has dominated many domains, gained much
success and popularity in recent years in stock market prediction. This
motivates us to provide a structured and comprehensive overview of the research
on stock market prediction focusing on deep learning techniques. We present
four elaborated subtasks of stock market prediction and propose a novel
taxonomy to summarize the state-of-the-art models based on deep neural networks
from 2011 to 2022. In addition, we also provide detailed statistics on the
datasets and evaluation metrics commonly used in the stock market. Finally, we
highlight some open issues and point out several future directions by sharing
some new perspectives on stock market prediction
Identifiable Latent Polynomial Causal Models Through the Lens of Change
Causal representation learning aims to unveil latent high-level causal
representations from observed low-level data. One of its primary tasks is to
provide reliable assurance of identifying these latent causal models, known as
identifiability. A recent breakthrough explores identifiability by leveraging
the change of causal influences among latent causal variables across multiple
environments \citep{liu2022identifying}. However, this progress rests on the
assumption that the causal relationships among latent causal variables adhere
strictly to linear Gaussian models. In this paper, we extend the scope of
latent causal models to involve nonlinear causal relationships, represented by
polynomial models, and general noise distributions conforming to the
exponential family. Additionally, we investigate the necessity of imposing
changes on all causal parameters and present partial identifiability results
when part of them remains unchanged. Further, we propose a novel empirical
estimation method, grounded in our theoretical finding, that enables learning
consistent latent causal representations. Our experimental results, obtained
from both synthetic and real-world data, validate our theoretical contributions
concerning identifiability and consistency